Skip to content

[AArch64][PAC] Introduce AArch64::PAC pseudo instruction #146488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

atrosinenko
Copy link
Contributor

Introduce a pseudo instruction to be selected instead of a pair of
MOVKXi and PAC[DI][AB] carrying address and immediate modifiers
as separate operands. The new pseudo instruction is expanded in
AsmPrinter, so that MOVKXi is emitted immediately before PAC[DI][AB].
This way, an attacker cannot control the immediate modifier used to sign
the value, even if address modifier can be substituted.

To simplify the instruction selection, select AArch64::PAC pseudo using
TableGen pattern and post-process its $AddrDisc operand by custom
inserter hook - this eliminates duplication of the logic for DAGISel
and GlobalISel. Furthermore, this improves cross-BB analysis in case of
DAGISel.

Copy link
Contributor Author

atrosinenko commented Jul 1, 2025

@llvmbot
Copy link
Member

llvmbot commented Jul 1, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Anatoly Trosinenko (atrosinenko)

Changes

Introduce a pseudo instruction to be selected instead of a pair of
MOVKXi and PAC[DI][AB] carrying address and immediate modifiers
as separate operands. The new pseudo instruction is expanded in
AsmPrinter, so that MOVKXi is emitted immediately before PAC[DI][AB].
This way, an attacker cannot control the immediate modifier used to sign
the value, even if address modifier can be substituted.

To simplify the instruction selection, select AArch64::PAC pseudo using
TableGen pattern and post-process its $AddrDisc operand by custom
inserter hook - this eliminates duplication of the logic for DAGISel
and GlobalISel. Furthermore, this improves cross-BB analysis in case of
DAGISel.


Full diff: https://github.com/llvm/llvm-project/pull/146488.diff

5 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp (+32)
  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+74)
  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.h (+7)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+20-1)
  • (added) llvm/test/CodeGen/AArch64/ptrauth-isel.ll (+205)
diff --git a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
index a16c104d8bef5..40890dbe6e532 100644
--- a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
+++ b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
@@ -171,6 +171,9 @@ class AArch64AsmPrinter : public AsmPrinter {
   // Emit the sequence for AUT or AUTPAC.
   void emitPtrauthAuthResign(const MachineInstr *MI);
 
+  // Emit the sequence for PAC.
+  void emitPtrauthSign(const MachineInstr *MI);
+
   // Emit the sequence to compute the discriminator.
   //
   // ScratchReg should be x16/x17.
@@ -2173,6 +2176,31 @@ void AArch64AsmPrinter::emitPtrauthAuthResign(const MachineInstr *MI) {
     OutStreamer->emitLabel(EndSym);
 }
 
+void AArch64AsmPrinter::emitPtrauthSign(const MachineInstr *MI) {
+  Register Val = MI->getOperand(1).getReg();
+  auto Key = (AArch64PACKey::ID)MI->getOperand(2).getImm();
+  uint64_t Disc = MI->getOperand(3).getImm();
+  Register AddrDisc = MI->getOperand(4).getReg();
+  bool AddrDiscKilled = MI->getOperand(4).isKill();
+
+  // Compute aut discriminator into x17
+  assert(isUInt<16>(Disc));
+  Register DiscReg = emitPtrauthDiscriminator(
+      Disc, AddrDisc, AArch64::X17, /*MayUseAddrAsScratch=*/AddrDiscKilled);
+  bool IsZeroDisc = DiscReg == AArch64::XZR;
+  unsigned Opc = getPACOpcodeForKey(Key, IsZeroDisc);
+
+  //  paciza x16      ; if  IsZeroDisc
+  //  pacia x16, x17  ; if !IsZeroDisc
+  MCInst PACInst;
+  PACInst.setOpcode(Opc);
+  PACInst.addOperand(MCOperand::createReg(Val));
+  PACInst.addOperand(MCOperand::createReg(Val));
+  if (!IsZeroDisc)
+    PACInst.addOperand(MCOperand::createReg(DiscReg));
+  EmitToStreamer(*OutStreamer, PACInst);
+}
+
 void AArch64AsmPrinter::emitPtrauthBranch(const MachineInstr *MI) {
   bool IsCall = MI->getOpcode() == AArch64::BLRA;
   unsigned BrTarget = MI->getOperand(0).getReg();
@@ -2867,6 +2895,10 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
     emitPtrauthAuthResign(MI);
     return;
 
+  case AArch64::PAC:
+    emitPtrauthSign(MI);
+    return;
+
   case AArch64::LOADauthptrstatic:
     LowerLOADauthptrstatic(*MI);
     return;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 1f98d69edb473..9e00905b3fec8 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -3071,6 +3071,75 @@ AArch64TargetLowering::EmitGetSMESaveSize(MachineInstr &MI,
   return BB;
 }
 
+// Helper function to find the instruction that defined a virtual register.
+// If unable to find such instruction, returns nullptr.
+static MachineInstr *stripVRegCopies(const MachineRegisterInfo &MRI,
+                                     Register Reg) {
+  while (Reg.isVirtual()) {
+    MachineInstr *DefMI = MRI.getVRegDef(Reg);
+    assert(DefMI && "Virtual register definition not found");
+    unsigned Opcode = DefMI->getOpcode();
+
+    if (Opcode == AArch64::COPY) {
+      Reg = DefMI->getOperand(1).getReg();
+      // Vreg is defined by copying from physreg.
+      if (Reg.isPhysical())
+        return DefMI;
+      continue;
+    }
+    if (Opcode == AArch64::SUBREG_TO_REG) {
+      Reg = DefMI->getOperand(2).getReg();
+      continue;
+    }
+
+    return DefMI;
+  }
+  return nullptr;
+}
+
+void AArch64TargetLowering::fixupBlendComponents(
+    MachineInstr &MI, MachineBasicBlock *BB, MachineOperand &IntDiscOp,
+    MachineOperand &AddrDiscOp, const TargetRegisterClass *AddrDiscRC) const {
+  const TargetInstrInfo *TII = Subtarget->getInstrInfo();
+  MachineRegisterInfo &MRI = MI.getMF()->getRegInfo();
+  const DebugLoc &DL = MI.getDebugLoc();
+
+  Register AddrDisc = AddrDiscOp.getReg();
+  int64_t IntDisc = IntDiscOp.getImm();
+
+  assert(IntDisc == 0 && "Blend components are already expanded");
+
+  MachineInstr *MaybeBlend = stripVRegCopies(MRI, AddrDisc);
+
+  if (MaybeBlend) {
+    // Detect blend(addr, imm) which is lowered as "MOVK addr, #imm, #48".
+    // Alternatively, recognize small immediate modifier passed via VReg.
+    if (MaybeBlend->getOpcode() == AArch64::MOVKXi &&
+        MaybeBlend->getOperand(3).getImm() == 48) {
+      AddrDisc = MaybeBlend->getOperand(1).getReg();
+      IntDisc = MaybeBlend->getOperand(2).getImm();
+    } else if (MaybeBlend->getOpcode() == AArch64::MOVi32imm &&
+               isUInt<16>(MaybeBlend->getOperand(1).getImm())) {
+      AddrDisc = AArch64::XZR;
+      IntDisc = MaybeBlend->getOperand(1).getImm();
+    }
+  }
+
+  // Normalize NoRegister operands emitted by GlobalISel.
+  if (AddrDisc == AArch64::NoRegister)
+    AddrDisc = AArch64::XZR;
+
+  // Make sure AddrDisc operand respects the register class imposed by MI.
+  if (AddrDisc != AArch64::XZR && MRI.getRegClass(AddrDisc) != AddrDiscRC) {
+    Register TmpReg = MRI.createVirtualRegister(AddrDiscRC);
+    BuildMI(*BB, MI, DL, TII->get(AArch64::COPY), TmpReg).addReg(AddrDisc);
+    AddrDisc = TmpReg;
+  }
+
+  AddrDiscOp.setReg(AddrDisc);
+  IntDiscOp.setImm(IntDisc);
+}
+
 MachineBasicBlock *AArch64TargetLowering::EmitInstrWithCustomInserter(
     MachineInstr &MI, MachineBasicBlock *BB) const {
 
@@ -3169,6 +3238,11 @@ MachineBasicBlock *AArch64TargetLowering::EmitInstrWithCustomInserter(
     return EmitZTInstr(MI, BB, AArch64::ZERO_T, /*Op0IsDef=*/true);
   case AArch64::MOVT_TIZ_PSEUDO:
     return EmitZTInstr(MI, BB, AArch64::MOVT_TIZ, /*Op0IsDef=*/true);
+
+  case AArch64::PAC:
+    fixupBlendComponents(MI, BB, MI.getOperand(3), MI.getOperand(4),
+                         &AArch64::GPR64noipRegClass);
+    return BB;
   }
 }
 
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index 89f90ee2b7707..3f71daa4daa91 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -182,6 +182,13 @@ class AArch64TargetLowering : public TargetLowering {
   MachineBasicBlock *EmitGetSMESaveSize(MachineInstr &MI,
                                         MachineBasicBlock *BB) const;
 
+  /// Replace (0, vreg) discriminator components with the operands of blend
+  /// or with (immediate, XZR) when possible.
+  void fixupBlendComponents(MachineInstr &MI, MachineBasicBlock *BB,
+                            MachineOperand &IntDiscOp,
+                            MachineOperand &AddrDiscOp,
+                            const TargetRegisterClass *AddrDiscRC) const;
+
   MachineBasicBlock *
   EmitInstrWithCustomInserter(MachineInstr &MI,
                               MachineBasicBlock *MBB) const override;
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 0f3f24f0853c9..98ba694551d9f 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -2025,7 +2025,7 @@ let Predicates = [HasPAuth] in {
     def DZB  : SignAuthZero<prefix_z,  0b11, !strconcat(asm, "dzb"), op>;
   }
 
-  defm PAC : SignAuth<0b000, 0b010, "pac", int_ptrauth_sign>;
+  defm PAC : SignAuth<0b000, 0b010, "pac", null_frag>;
   defm AUT : SignAuth<0b001, 0b011, "aut", null_frag>;
 
   def XPACI : ClearAuth<0, "xpaci">;
@@ -2131,6 +2131,25 @@ let Predicates = [HasPAuth] in {
     let Uses = [X16];
   }
 
+  // PAC pseudo instruction. Is AsmPrinter, it is expanded into an actual PAC*
+  // instruction immediately preceded by the discriminator computation.
+  // This enforces the expected immediate modifier is used for signing, even
+  // if an attacker is able to substitute AddrDisc.
+  def PAC : Pseudo<(outs GPR64:$SignedVal),
+                   (ins GPR64:$Val, i32imm:$Key, i64imm:$Disc, GPR64noip:$AddrDisc),
+                   [], "$SignedVal = $Val">, Sched<[WriteI, ReadI]> {
+    let isCodeGenOnly = 1;
+    let hasSideEffects = 0;
+    let mayStore = 0;
+    let mayLoad = 0;
+    let Size = 12;
+    let Defs = [X17];
+    let usesCustomInserter = 1;
+  }
+
+  def : Pat<(int_ptrauth_sign GPR64:$Val, timm:$Key, GPR64noip:$AddrDisc),
+            (PAC GPR64:$Val, $Key, 0, GPR64noip:$AddrDisc)>;
+
   // AUT and re-PAC a value, using different keys/data.
   // This directly manipulates x16/x17, which are the only registers the OS
   // guarantees are safe to use for sensitive operations.
diff --git a/llvm/test/CodeGen/AArch64/ptrauth-isel.ll b/llvm/test/CodeGen/AArch64/ptrauth-isel.ll
new file mode 100644
index 0000000000000..89ce439f5b47b
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/ptrauth-isel.ll
@@ -0,0 +1,205 @@
+; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple arm64e-apple-darwin             -verify-machineinstrs -stop-after=finalize-isel -global-isel=0 \
+; RUN:     | FileCheck %s --check-prefixes=DAGISEL
+; RUN: llc < %s -mtriple arm64e-apple-darwin             -verify-machineinstrs -stop-after=finalize-isel -global-isel=1 -global-isel-abort=1 \
+; RUN:     | FileCheck %s --check-prefixes=GISEL
+; RUN: llc < %s -mtriple aarch64-linux-gnu -mattr=+pauth -verify-machineinstrs -stop-after=finalize-isel -global-isel=0 \
+; RUN:     | FileCheck %s --check-prefixes=DAGISEL
+; RUN: llc < %s -mtriple aarch64-linux-gnu -mattr=+pauth -verify-machineinstrs -stop-after=finalize-isel -global-isel=1 -global-isel-abort=1 \
+; RUN:     | FileCheck %s --check-prefixes=GISEL
+
+; Check MIR produced by the instruction selector to validate properties that
+; cannot be reliably tested by only inspecting the final asm output.
+
+@discvar = dso_local global i64 0
+
+; Make sure the components of blend(addr, imm) are recognized and passed to
+; PAC pseudo via separate operands to prevent substitution of the immediate
+; modifier.
+;
+; MIR output of the instruction selector is inspected, as it is hard to reliably
+; distinguish MOVKXi immediately followed by a pseudo from a standalone pseudo
+; instruction carrying address and immediate modifiers in its separate operands
+; by only observing the final asm output.
+
+define i64 @small_imm_disc(i64 %addr) {
+  ; DAGISEL-LABEL: name: small_imm_disc
+  ; DAGISEL: bb.0.entry:
+  ; DAGISEL-NEXT:   liveins: $x0
+  ; DAGISEL-NEXT: {{  $}}
+  ; DAGISEL-NEXT:   [[COPY:%[0-9]+]]:gpr64 = COPY $x0
+  ; DAGISEL-NEXT:   [[MOVi32imm:%[0-9]+]]:gpr32 = MOVi32imm 42
+  ; DAGISEL-NEXT:   [[SUBREG_TO_REG:%[0-9]+]]:gpr64noip = SUBREG_TO_REG 0, killed [[MOVi32imm]], %subreg.sub_32
+  ; DAGISEL-NEXT:   [[PAC:%[0-9]+]]:gpr64 = PAC [[COPY]], 2, 42, killed $xzr, implicit-def dead $x17
+  ; DAGISEL-NEXT:   $x0 = COPY [[PAC]]
+  ; DAGISEL-NEXT:   RET_ReallyLR implicit $x0
+  ;
+  ; GISEL-LABEL: name: small_imm_disc
+  ; GISEL: bb.1.entry:
+  ; GISEL-NEXT:   liveins: $x0
+  ; GISEL-NEXT: {{  $}}
+  ; GISEL-NEXT:   [[COPY:%[0-9]+]]:gpr64 = COPY $x0
+  ; GISEL-NEXT:   [[MOVi32imm:%[0-9]+]]:gpr32 = MOVi32imm 42
+  ; GISEL-NEXT:   [[SUBREG_TO_REG:%[0-9]+]]:gpr64noip = SUBREG_TO_REG 0, [[MOVi32imm]], %subreg.sub_32
+  ; GISEL-NEXT:   [[PAC:%[0-9]+]]:gpr64 = PAC [[COPY]], 2, 42, $xzr, implicit-def dead $x17
+  ; GISEL-NEXT:   $x0 = COPY [[PAC]]
+  ; GISEL-NEXT:   RET_ReallyLR implicit $x0
+entry:
+  %signed = call i64 @llvm.ptrauth.sign(i64 %addr, i32 2, i64 42)
+  ret i64 %signed
+}
+
+define i64 @large_imm_disc_wreg(i64 %addr) {
+  ; DAGISEL-LABEL: name: large_imm_disc_wreg
+  ; DAGISEL: bb.0.entry:
+  ; DAGISEL-NEXT:   liveins: $x0
+  ; DAGISEL-NEXT: {{  $}}
+  ; DAGISEL-NEXT:   [[COPY:%[0-9]+]]:gpr64 = COPY $x0
+  ; DAGISEL-NEXT:   [[MOVi32imm:%[0-9]+]]:gpr32 = MOVi32imm 12345678
+  ; DAGISEL-NEXT:   [[SUBREG_TO_REG:%[0-9]+]]:gpr64noip = SUBREG_TO_REG 0, killed [[MOVi32imm]], %subreg.sub_32
+  ; DAGISEL-NEXT:   [[PAC:%[0-9]+]]:gpr64 = PAC [[COPY]], 2, 0, killed [[SUBREG_TO_REG]], implicit-def dead $x17
+  ; DAGISEL-NEXT:   $x0 = COPY [[PAC]]
+  ; DAGISEL-NEXT:   RET_ReallyLR implicit $x0
+  ;
+  ; GISEL-LABEL: name: large_imm_disc_wreg
+  ; GISEL: bb.1.entry:
+  ; GISEL-NEXT:   liveins: $x0
+  ; GISEL-NEXT: {{  $}}
+  ; GISEL-NEXT:   [[COPY:%[0-9]+]]:gpr64 = COPY $x0
+  ; GISEL-NEXT:   [[MOVi32imm:%[0-9]+]]:gpr32 = MOVi32imm 12345678
+  ; GISEL-NEXT:   [[SUBREG_TO_REG:%[0-9]+]]:gpr64noip = SUBREG_TO_REG 0, [[MOVi32imm]], %subreg.sub_32
+  ; GISEL-NEXT:   [[PAC:%[0-9]+]]:gpr64 = PAC [[COPY]], 2, 0, [[SUBREG_TO_REG]], implicit-def dead $x17
+  ; GISEL-NEXT:   $x0 = COPY [[PAC]]
+  ; GISEL-NEXT:   RET_ReallyLR implicit $x0
+entry:
+  %signed = call i64 @llvm.ptrauth.sign(i64 %addr, i32 2, i64 12345678)
+  ret i64 %signed
+}
+
+define i64 @large_imm_disc_xreg(i64 %addr) {
+  ; DAGISEL-LABEL: name: large_imm_disc_xreg
+  ; DAGISEL: bb.0.entry:
+  ; DAGISEL-NEXT:   liveins: $x0
+  ; DAGISEL-NEXT: {{  $}}
+  ; DAGISEL-NEXT:   [[COPY:%[0-9]+]]:gpr64 = COPY $x0
+  ; DAGISEL-NEXT:   [[MOVi64imm:%[0-9]+]]:gpr64noip = MOVi64imm 123456789012345
+  ; DAGISEL-NEXT:   [[PAC:%[0-9]+]]:gpr64 = PAC [[COPY]], 2, 0, killed [[MOVi64imm]], implicit-def dead $x17
+  ; DAGISEL-NEXT:   $x0 = COPY [[PAC]]
+  ; DAGISEL-NEXT:   RET_ReallyLR implicit $x0
+  ;
+  ; GISEL-LABEL: name: large_imm_disc_xreg
+  ; GISEL: bb.1.entry:
+  ; GISEL-NEXT:   liveins: $x0
+  ; GISEL-NEXT: {{  $}}
+  ; GISEL-NEXT:   [[COPY:%[0-9]+]]:gpr64 = COPY $x0
+  ; GISEL-NEXT:   [[MOVi64imm:%[0-9]+]]:gpr64noip = MOVi64imm 123456789012345
+  ; GISEL-NEXT:   [[PAC:%[0-9]+]]:gpr64 = PAC [[COPY]], 2, 0, [[MOVi64imm]], implicit-def dead $x17
+  ; GISEL-NEXT:   $x0 = COPY [[PAC]]
+  ; GISEL-NEXT:   RET_ReallyLR implicit $x0
+entry:
+  %signed = call i64 @llvm.ptrauth.sign(i64 %addr, i32 2, i64 123456789012345)
+  ret i64 %signed
+}
+
+define i64 @blend_and_sign_same_bb(i64 %addr) {
+  ; DAGISEL-LABEL: name: blend_and_sign_same_bb
+  ; DAGISEL: bb.0.entry:
+  ; DAGISEL-NEXT:   liveins: $x0
+  ; DAGISEL-NEXT: {{  $}}
+  ; DAGISEL-NEXT:   [[COPY:%[0-9]+]]:gpr64 = COPY $x0
+  ; DAGISEL-NEXT:   [[ADRP:%[0-9]+]]:gpr64common = ADRP target-flags(aarch64-page) @discvar
+  ; DAGISEL-NEXT:   [[LDRXui:%[0-9]+]]:gpr64 = LDRXui killed [[ADRP]], target-flags(aarch64-pageoff, aarch64-nc) @discvar :: (dereferenceable load (s64) from @discvar)
+  ; DAGISEL-NEXT:   [[MOVKXi:%[0-9]+]]:gpr64noip = MOVKXi [[LDRXui]], 42, 48
+  ; DAGISEL-NEXT:   [[COPY1:%[0-9]+]]:gpr64noip = COPY [[LDRXui]]
+  ; DAGISEL-NEXT:   [[PAC:%[0-9]+]]:gpr64 = PAC [[COPY]], 2, 42, killed [[COPY1]], implicit-def dead $x17
+  ; DAGISEL-NEXT:   $x0 = COPY [[PAC]]
+  ; DAGISEL-NEXT:   RET_ReallyLR implicit $x0
+  ;
+  ; GISEL-LABEL: name: blend_and_sign_same_bb
+  ; GISEL: bb.1.entry:
+  ; GISEL-NEXT:   liveins: $x0
+  ; GISEL-NEXT: {{  $}}
+  ; GISEL-NEXT:   [[COPY:%[0-9]+]]:gpr64 = COPY $x0
+  ; GISEL-NEXT:   [[ADRP:%[0-9]+]]:gpr64common = ADRP target-flags(aarch64-page) @discvar
+  ; GISEL-NEXT:   [[LDRXui:%[0-9]+]]:gpr64 = LDRXui [[ADRP]], target-flags(aarch64-pageoff, aarch64-nc) @discvar :: (dereferenceable load (s64) from @discvar)
+  ; GISEL-NEXT:   [[MOVKXi:%[0-9]+]]:gpr64noip = MOVKXi [[LDRXui]], 42, 48
+  ; GISEL-NEXT:   [[COPY1:%[0-9]+]]:gpr64noip = COPY [[LDRXui]]
+  ; GISEL-NEXT:   [[PAC:%[0-9]+]]:gpr64 = PAC [[COPY]], 2, 42, [[COPY1]], implicit-def dead $x17
+  ; GISEL-NEXT:   $x0 = COPY [[PAC]]
+  ; GISEL-NEXT:   RET_ReallyLR implicit $x0
+entry:
+  %addrdisc = load i64, ptr @discvar
+  %disc = call i64 @llvm.ptrauth.blend(i64 %addrdisc, i64 42)
+  %signed = call i64 @llvm.ptrauth.sign(i64 %addr, i32 2, i64 %disc)
+  ret i64 %signed
+}
+
+; In the below test cases both %addrdisc and %disc are computed (i.e. they are
+; neither global addresses, nor function arguments) in a different basic block,
+; making them harder to express via ISD::PtrAuthGlobalAddress.
+
+define i64 @blend_and_sign_different_bbs(i64 %addr, i64 %cond) {
+  ; DAGISEL-LABEL: name: blend_and_sign_different_bbs
+  ; DAGISEL: bb.0.entry:
+  ; DAGISEL-NEXT:   successors: %bb.1(0x50000000), %bb.2(0x30000000)
+  ; DAGISEL-NEXT:   liveins: $x0, $x1
+  ; DAGISEL-NEXT: {{  $}}
+  ; DAGISEL-NEXT:   [[COPY:%[0-9]+]]:gpr64 = COPY $x1
+  ; DAGISEL-NEXT:   [[COPY1:%[0-9]+]]:gpr64 = COPY $x0
+  ; DAGISEL-NEXT:   [[ADRP:%[0-9]+]]:gpr64common = ADRP target-flags(aarch64-page) @discvar
+  ; DAGISEL-NEXT:   [[LDRXui:%[0-9]+]]:gpr64 = LDRXui killed [[ADRP]], target-flags(aarch64-pageoff, aarch64-nc) @discvar :: (dereferenceable load (s64) from @discvar)
+  ; DAGISEL-NEXT:   [[MOVKXi:%[0-9]+]]:gpr64 = MOVKXi [[LDRXui]], 42, 48
+  ; DAGISEL-NEXT:   [[COPY2:%[0-9]+]]:gpr64noip = COPY [[MOVKXi]]
+  ; DAGISEL-NEXT:   CBZX [[COPY]], %bb.2
+  ; DAGISEL-NEXT:   B %bb.1
+  ; DAGISEL-NEXT: {{  $}}
+  ; DAGISEL-NEXT: bb.1.next:
+  ; DAGISEL-NEXT:   successors: %bb.2(0x80000000)
+  ; DAGISEL-NEXT: {{  $}}
+  ; DAGISEL-NEXT:   [[COPY3:%[0-9]+]]:gpr64common = COPY [[COPY2]]
+  ; DAGISEL-NEXT:   INLINEASM &nop, 1 /* sideeffect attdialect */, 3866633 /* reguse:GPR64common */, [[COPY3]]
+  ; DAGISEL-NEXT: {{  $}}
+  ; DAGISEL-NEXT: bb.2.exit:
+  ; DAGISEL-NEXT:   [[COPY4:%[0-9]+]]:gpr64noip = COPY [[LDRXui]]
+  ; DAGISEL-NEXT:   [[PAC:%[0-9]+]]:gpr64 = PAC [[COPY1]], 2, 42, [[COPY4]], implicit-def dead $x17
+  ; DAGISEL-NEXT:   $x0 = COPY [[PAC]]
+  ; DAGISEL-NEXT:   RET_ReallyLR implicit $x0
+  ;
+  ; GISEL-LABEL: name: blend_and_sign_different_bbs
+  ; GISEL: bb.1.entry:
+  ; GISEL-NEXT:   successors: %bb.2(0x50000000), %bb.3(0x30000000)
+  ; GISEL-NEXT:   liveins: $x0, $x1
+  ; GISEL-NEXT: {{  $}}
+  ; GISEL-NEXT:   [[COPY:%[0-9]+]]:gpr64 = COPY $x0
+  ; GISEL-NEXT:   [[COPY1:%[0-9]+]]:gpr64 = COPY $x1
+  ; GISEL-NEXT:   [[ADRP:%[0-9]+]]:gpr64common = ADRP target-flags(aarch64-page) @discvar
+  ; GISEL-NEXT:   [[LDRXui:%[0-9]+]]:gpr64 = LDRXui [[ADRP]], target-flags(aarch64-pageoff, aarch64-nc) @discvar :: (dereferenceable load (s64) from @discvar)
+  ; GISEL-NEXT:   [[MOVKXi:%[0-9]+]]:gpr64noip = MOVKXi [[LDRXui]], 42, 48
+  ; GISEL-NEXT:   CBZX [[COPY1]], %bb.3
+  ; GISEL-NEXT:   B %bb.2
+  ; GISEL-NEXT: {{  $}}
+  ; GISEL-NEXT: bb.2.next:
+  ; GISEL-NEXT:   successors: %bb.3(0x80000000)
+  ; GISEL-NEXT: {{  $}}
+  ; GISEL-NEXT:   [[COPY2:%[0-9]+]]:gpr64common = COPY [[MOVKXi]]
+  ; GISEL-NEXT:   INLINEASM &nop, 1 /* sideeffect attdialect */, 3866633 /* reguse:GPR64common */, [[COPY2]]
+  ; GISEL-NEXT: {{  $}}
+  ; GISEL-NEXT: bb.3.exit:
+  ; GISEL-NEXT:   [[COPY3:%[0-9]+]]:gpr64noip = COPY [[LDRXui]]
+  ; GISEL-NEXT:   [[PAC:%[0-9]+]]:gpr64 = PAC [[COPY]], 2, 42, [[COPY3]], implicit-def dead $x17
+  ; GISEL-NEXT:   $x0 = COPY [[PAC]]
+  ; GISEL-NEXT:   RET_ReallyLR implicit $x0
+entry:
+  %addrdisc = load i64, ptr @discvar
+  %disc = call i64 @llvm.ptrauth.blend(i64 %addrdisc, i64 42)
+  %cond.b = icmp ne i64 %cond, 0
+  br i1 %cond.b, label %next, label %exit
+
+next:
+  call void asm sideeffect "nop", "r"(i64 %disc)
+  br label %exit
+
+exit:
+  %signed = call i64 @llvm.ptrauth.sign(i64 %addr, i32 2, i64 %disc)
+  ret i64 %signed
+}

@atrosinenko atrosinenko force-pushed the users/atrosinenko/pauth-imm-modifier-sign branch from a7b265c to ec68c72 Compare July 1, 2025 13:26
@atrosinenko atrosinenko force-pushed the users/atrosinenko/machine-licm-implicit-defs branch from 079d654 to 707a36c Compare July 1, 2025 13:26
@atrosinenko atrosinenko force-pushed the users/atrosinenko/pauth-imm-modifier-sign branch from ec68c72 to ba9d896 Compare July 3, 2025 16:47
@atrosinenko
Copy link
Contributor Author

Ping.

@atrosinenko atrosinenko requested review from pcc and ojhunt July 9, 2025 14:20
Copy link
Collaborator

@asl asl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks reasonable. And in parallel with the existing approach of combined resigns. Some minor comments below

@@ -182,6 +182,13 @@ class AArch64TargetLowering : public TargetLowering {
MachineBasicBlock *EmitGetSMESaveSize(MachineInstr &MI,
MachineBasicBlock *BB) const;

/// Replace (0, vreg) discriminator components with the operands of blend
/// or with (immediate, XZR) when possible.
void fixupBlendComponents(MachineInstr &MI, MachineBasicBlock *BB,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably suggest to use a bit more specific naming. As blend also could refer to vector blend operation.

@pcc
Copy link
Contributor

pcc commented Jul 10, 2025

I think this should be opt-in behavior. It might be useful if the goal is to protect the stack, but in cases where the goal is to only protect the heap due to the high cost of protecting the stack, this change would only add overhead.

If you are deploying a comprehensive stack protection, I think something like Darwin's x16/x17 protection would be necessary, so maybe we can use isX16X17Safer() as the control for whether to turn this on?

@atrosinenko atrosinenko changed the base branch from users/atrosinenko/machine-licm-implicit-defs to main July 10, 2025 10:56
@atrosinenko atrosinenko changed the base branch from main to users/atrosinenko/machine-licm-implicit-defs July 10, 2025 10:57
Introduce a pseudo instruction to be selected instead of a pair of
`MOVKXi` and `PAC[DI][AB]` carrying address and immediate modifiers
as separate operands. The new pseudo instruction is expanded in
AsmPrinter, so that MOVKXi is emitted immediately before `PAC[DI][AB]`.
This way, an attacker cannot control the immediate modifier used to sign
the value, even if address modifier can be substituted.

To simplify the instruction selection, select AArch64::PAC pseudo using
TableGen pattern and post-process its $AddrDisc operand by custom
inserter hook - this eliminates duplication of the logic for DAGISel
and GlobalISel. Furthermore, this improves cross-BB analysis in case of
DAGISel.
@atrosinenko atrosinenko changed the base branch from users/atrosinenko/machine-licm-implicit-defs to main July 10, 2025 13:54
@atrosinenko atrosinenko force-pushed the users/atrosinenko/pauth-imm-modifier-sign branch from b4d7b7e to b9ea24b Compare July 10, 2025 13:54
@atrosinenko
Copy link
Contributor Author

@pcc This transformation looks quite light-weight at first. Here is the assembly code that is ultimately produced for ptrauth-isel.ll test source with this patch applied:

Output of llc -mtriple aarch64-linux-gnu -mattr=+pauth -asm-verbose=0 < ptrauth-isel.ll:

        .file   "<stdin>"
        .text
        .globl  small_imm_disc
        .p2align        2
        .type   small_imm_disc,@function
small_imm_disc:
        .cfi_startproc
        mov     x17, #42
        pacda   x0, x17
        ret
.Lfunc_end0:
        .size   small_imm_disc, .Lfunc_end0-small_imm_disc
        .cfi_endproc

        .globl  large_imm_disc_wreg
        .p2align        2
        .type   large_imm_disc_wreg,@function
large_imm_disc_wreg:
        .cfi_startproc
        mov     w8, #24910
        movk    w8, #188, lsl #16
        pacda   x0, x8
        ret
.Lfunc_end1:
        .size   large_imm_disc_wreg, .Lfunc_end1-large_imm_disc_wreg
        .cfi_endproc

        .globl  large_imm_disc_xreg
        .p2align        2
        .type   large_imm_disc_xreg,@function
large_imm_disc_xreg:
        .cfi_startproc
        mov     x8, #57209
        movk    x8, #34317, lsl #16
        movk    x8, #28744, lsl #32
        pacda   x0, x8
        ret
.Lfunc_end2:
        .size   large_imm_disc_xreg, .Lfunc_end2-large_imm_disc_xreg
        .cfi_endproc

        .globl  blend_and_sign_same_bb
        .p2align        2
        .type   blend_and_sign_same_bb,@function
blend_and_sign_same_bb:
        .cfi_startproc
        adrp    x8, discvar
        ldr     x8, [x8, :lo12:discvar]
        movk    x8, #42, lsl #48
        pacda   x0, x8
        ret
.Lfunc_end3:
        .size   blend_and_sign_same_bb, .Lfunc_end3-blend_and_sign_same_bb
        .cfi_endproc

        .globl  blend_and_sign_different_bbs
        .p2align        2
        .type   blend_and_sign_different_bbs,@function
blend_and_sign_different_bbs:
        .cfi_startproc
        adrp    x8, discvar
        ldr     x8, [x8, :lo12:discvar]
        cbz     x1, .LBB4_2
        mov     x9, x8
        movk    x9, #42, lsl #48
        //APP
        nop
        //NO_APP
.LBB4_2:
        movk    x8, #42, lsl #48
        pacda   x0, x8
        ret
.Lfunc_end4:
        .size   blend_and_sign_different_bbs, .Lfunc_end4-blend_and_sign_different_bbs
        .cfi_endproc

        .type   discvar,@object
        .bss
        .globl  discvar
        .p2align        3, 0x0
discvar:
        .xword  0
        .size   discvar, 8

        .section        ".note.GNU-stack","",@progbits

In the above example, only blend_and_sign_different_bbs function could be simplified by rewriting

blend_and_sign_different_bbs:
        adrp    x8, discvar
        ldr     x8, [x8, :lo12:discvar]
        cbz     x1, .LBB4_2
        mov     x9, x8
        movk    x9, #42, lsl #48
        // x9 is input to inline asm //
.LBB4_2:
        movk    x8, #42, lsl #48
        pacda   x0, x8
        ret

like this

blend_and_sign_different_bbs:
        adrp    x8, discvar
        ldr     x8, [x8, :lo12:discvar]
        cbz     x1, .LBB4_2
        movk    x8, #42, lsl #48
        // x8 is input to inline asm //
.LBB4_2:
        pacda   x0, x8
        ret

which saves two instructions (mov and movk).

Aside from subtly affecting register allocation, I expect this patch to add at most two instructions (mov and movk) per pac* like in this example:

define void @test(i64 %addrdisc, i64 %a) {
  %disc = call i64 @llvm.ptrauth.blend(i64 %addrdisc, i64 42)
  %signed0 = call i64 @llvm.ptrauth.sign(i64 %a, i32 0, i64 %disc)
  %b = call i64 asm sideeffect "add $0, $1, 123", "=r,r"(i64 %signed0)
  %signed1 = call i64 @llvm.ptrauth.sign(i64 %b, i32 1, i64 %disc)
  %c = call i64 asm sideeffect "add $0, $1, 123", "=r,r"(i64 %signed1)
  %signed2 = call i64 @llvm.ptrauth.sign(i64 %c, i32 2, i64 %disc)
  %d = call i64 asm sideeffect "add $0, $1, 123", "=r,r"(i64 %signed2)
  %signed3 = call i64 @llvm.ptrauth.sign(i64 %d, i32 3, i64 %disc)
  call void asm sideeffect "nop", "r"(i64 %signed3)
  ret void
}

which is translated into

test:
        mov     x17, x0
        movk    x17, #42, lsl #48
        pacia   x1, x17
        add     x8, x1, #123
        mov     x17, x0
        movk    x17, #42, lsl #48
        pacib   x8, x17
        add     x8, x8, #123
        mov     x17, x0
        movk    x17, #42, lsl #48
        pacda   x8, x17
        add     x8, x8, #123
        movk    x0, #42, lsl #48
        pacdb   x8, x0
        nop
        ret

Without this patch, something along these lines could be emitted:

test:
        movk    x0, #42, lsl #48
        pacia   x1, x0
        add     x8, x1, #123
        pacib   x8, x0
        add     x8, x8, #123
        pacda   x8, x0
        add     x8, x8, #123
        pacdb   x8, x0
        nop
        ret

On the other hand, there is no overhead at all in the most trivial cases, as using PAC pseudo instruction does not require separately computing its discriminator operand with MOVKXi anymore, and mov instruction can be omitted when address discriminator operand is dead after PAC.

Please note that there are quite a few redundant instructions in MIR output generated right after finalize-isel, but they are dead-code-eliminated later. It would improve readability of CHECK lines in ptrauth-isel.ll to run DCE (dead-mi-elimination pass?) before dumping MIR, but there doesn't seem to exist an option to construct an arbitrary machine pipeline, thus I stop right after finalize-isel for the sake of test robustness.

By the way, the same approach is already used for AUT and AUTPAC (which are expanded into longer sequences, though, as it may be desirable to explicitly check the authenticated pointer in case the CPU does not implement FEAT_FPAC) - these are updated in #146489.

On the other hand, it should be rather straightforward to make this behavior opt-in simply by calling fixupBlendComponents function conditionally, but I doubt it is strictly related to isX16X17Safer() in this particular case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants